Load Libraries and Import Data
# Load necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(readr)
# Read the CSV files
final_data <- read_csv("data/G6.final.data.csv", )
## Rows: 939 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): ID, location, species, climate
## dbl (4): ecotr_id, tree_id, pear_id, quality_index
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
size_data <- read_csv("data/G6.size.data.csv")
## Rows: 207 Columns: 30
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): species, climate
## dbl (28): pear_id, tree_id, ecotr_id, week_0, week_1, week_2, week_3, week_4...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
soil_data <- read_csv("data/G6.soil.data.csv")
## Rows: 144 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): species, climate
## dbl (19): ecotr_id, Bulk Density, Infiltration, Soil Porosity, Soil Depth, W...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Quick preview of data structures
head(final_data)
## # A tibble: 6 × 8
## ID location ecotr_id tree_id pear_id species climate quality_index
## <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
## 1 BE_1_1_1 BE 1 1 1 Conference Scenario 1 64
## 2 BE_1_1_2 BE 1 1 2 Conference Scenario 1 35
## 3 BE_1_1_3 BE 1 1 3 Conference Scenario 1 71
## 4 BE_1_1_4 BE 1 1 4 Conference Scenario 1 62
## 5 BE_1_1_5 BE 1 1 5 Conference Scenario 1 71
## 6 BE_1_1_6 BE 1 1 6 Conference Scenario 1 94
head(size_data)
## # A tibble: 6 × 30
## pear_id tree_id ecotr_id species climate week_0 week_1 week_2 week_3 week_4
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 Conference Scenar… 0 0.6 0.8 1.2 2.3
## 2 2 1 1 Conference Scenar… 0 0.2 0.4 0.5 0.8
## 3 3 1 1 Conference Scenar… 0 0.6 1.1 1.5 1.5
## 4 4 1 1 Conference Scenar… 0 0.3 1.1 1.3 1.3
## 5 5 1 1 Conference Scenar… 0 0.2 1.3 1.4 2.1
## 6 6 1 1 Conference Scenar… 0 0.5 2 2 2.1
## # ℹ 20 more variables: week_5 <dbl>, week_6 <dbl>, week_7 <dbl>, week_8 <dbl>,
## # week_9 <dbl>, week_10 <dbl>, week_11 <dbl>, week_12 <dbl>, week_13 <dbl>,
## # week_14 <dbl>, week_15 <dbl>, week_16 <dbl>, week_17 <dbl>, week_18 <dbl>,
## # week_19 <dbl>, week_20 <dbl>, week_21 <dbl>, week_22 <dbl>, week_23 <dbl>,
## # week_24 <dbl>
head(soil_data)
## # A tibble: 6 × 21
## species ecotr_id climate `Bulk Density` Infiltration `Soil Porosity`
## <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 Conference 1 Scenario 1 1.33 6.87 37.4
## 2 Conference 1 Scenario 1 1.3 1.87 25.0
## 3 Conference 1 Scenario 1 1.55 3.29 34.7
## 4 Doyenne 1 Scenario 1 1.47 4.17 40.6
## 5 Doyenne 1 Scenario 1 1.52 2.91 30.3
## 6 Doyenne 1 Scenario 1 1.59 7.16 40.9
## # ℹ 15 more variables: `Soil Depth` <dbl>, `Water Holding Capacity` <dbl>,
## # `Soil Nitrate` <dbl>, `Soil pH` <dbl>, Phosphorus <dbl>, Potassium <dbl>,
## # Earthworms <dbl>, `Microbial Biomass C` <dbl>, `Microbial Biomass N` <dbl>,
## # `Particulate Organic Matter` <dbl>, `Mineralizable N` <dbl>,
## # `Soil Enzymes` <dbl>, `Soil Respiration` <dbl>,
## # `Total Organic Carbon` <dbl>, pears <dbl>
Data Preparation
# Check for missing values in each dataset
sapply(final_data, function(x) sum(is.na(x)))
## ID location ecotr_id tree_id pear_id
## 0 0 0 0 0
## species climate quality_index
## 0 0 0
sapply(size_data, function(x) sum(is.na(x)))
## pear_id tree_id ecotr_id species climate week_0 week_1 week_2
## 0 0 0 0 0 0 0 0
## week_3 week_4 week_5 week_6 week_7 week_8 week_9 week_10
## 0 0 0 0 0 0 0 0
## week_11 week_12 week_13 week_14 week_15 week_16 week_17 week_18
## 0 0 0 0 0 0 0 0
## week_19 week_20 week_21 week_22 week_23 week_24
## 0 0 0 0 0 0
sapply(soil_data, function(x) sum(is.na(x)))
## species ecotr_id
## 0 0
## climate Bulk Density
## 0 0
## Infiltration Soil Porosity
## 0 0
## Soil Depth Water Holding Capacity
## 0 0
## Soil Nitrate Soil pH
## 0 0
## Phosphorus Potassium
## 0 0
## Earthworms Microbial Biomass C
## 0 0
## Microbial Biomass N Particulate Organic Matter
## 0 0
## Mineralizable N Soil Enzymes
## 0 0
## Soil Respiration Total Organic Carbon
## 0 0
## pears
## 0
#check data type of each column
sapply(final_data, typeof)
## ID location ecotr_id tree_id pear_id
## "character" "character" "double" "double" "double"
## species climate quality_index
## "character" "character" "double"
# Convert 'climate' and 'species' to factors
final_data <- final_data %>%
mutate(
climate = as.factor(climate),
species = as.factor(species),
location = as.factor(location),
ecotr_id = as.factor(ecotr_id),
tree_id = as.factor(tree_id)
)
# Add binary quality indicator (cut-off = 60)
final_data <- final_data %>%
mutate(quality_b = ifelse(quality_index >= 60, 1, 0))
Main Research Question
How Does pear quality varies across the four climate scenarios and
which scenario improves pear quality.
Explor Data
# 1️⃣ Distribution of Quality Index
ggplot(final_data, aes(x = quality_index)) +
geom_histogram(binwidth = 5, fill = "#69b3a2", color = "black") +
theme_minimal() +
labs(title = "Distribution of Quality Index", x = "Quality Index", y = "Count")

# 2️⃣ Boxplot by Species
ggplot(final_data, aes(x = species, y = quality_index, fill = species)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Quality Index by Species", x = "Species", y = "Quality Index")

# 3️⃣ Boxplot by Climate Scenario
ggplot(final_data, aes(x = climate, y = quality_index, fill = climate)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Quality Index by Climate Scenario", x = "Climate", y = "Quality Index")

# 4️⃣ Interaction: Species × Climate
ggplot(final_data, aes(x = climate, y = quality_index, fill = species)) +
geom_boxplot(position = position_dodge(width = 0.8)) +
theme_minimal() +
labs(title = "Interaction: Species × Climate", x = "Climate", y = "Quality Index")

# 5️⃣ Scatterplot: Quality Index by Tree
ggplot(final_data, aes(x = tree_id, y = quality_index)) +
geom_point(color = "steelblue") +
theme_minimal() +
labs(title = "Quality Index per Tree", x = "Tree ID", y = "Quality Index") +
theme(axis.text.x = element_text(angle = 90))

# 6️⃣ Mean ± SD plot by Species and Climate (useful for model design)
final_data %>%
group_by(species, climate) %>%
summarise(
mean_quality = mean(quality_index, na.rm = TRUE),
sd_quality = sd(quality_index, na.rm = TRUE),
.groups = "drop"
) %>%
ggplot(aes(x = climate, y = mean_quality, group = species, color = species)) +
geom_line() +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_quality - sd_quality, ymax = mean_quality + sd_quality), width = 0.2) +
theme_minimal() +
labs(title = "Mean ± SD of Quality Index", y = "Mean Quality Index", x = "Climate Scenario")

- interaction effect was not significant so removed.
fixed effects
fixed effects
Exploratory Data Analysis
ggplot(final_data, aes(x = quality_index)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
geom_vline(xintercept = 60, linetype = "dashed", color = "red") +
theme_minimal() +
labs(title = "Distribution of Quality Index", x = "Quality Index", y = "Count")

ggplot(final_data, aes(x = factor(quality_b))) +
geom_bar(fill = "steelblue") +
theme_minimal() +
labs(title = "Good vs. Poor Quality Counts", x = "Quality (0 = Poor, 1 = Good)", y = "Count")

final_data %>%
group_by(climate) %>%
summarise(GoodQualityRate = mean(quality_b)) %>%
ggplot(aes(x = climate, y = GoodQualityRate, fill = climate)) +
geom_col() +
theme_minimal() +
labs(title = "Proportion of Good Quality Pears by Climate", y = "Proportion", x = "Climate")
Seem Scenario 2 has more good quality pears followed by scenario 3
head(final_data)
## # A tibble: 6 × 9
## ID location ecotr_id tree_id pear_id species climate quality_index
## <chr> <fct> <fct> <fct> <dbl> <fct> <fct> <dbl>
## 1 BE_1_1_1 BE 1 1 1 Conference Scenario 1 64
## 2 BE_1_1_2 BE 1 1 2 Conference Scenario 1 35
## 3 BE_1_1_3 BE 1 1 3 Conference Scenario 1 71
## 4 BE_1_1_4 BE 1 1 4 Conference Scenario 1 62
## 5 BE_1_1_5 BE 1 1 5 Conference Scenario 1 71
## 6 BE_1_1_6 BE 1 1 6 Conference Scenario 1 94
## # ℹ 1 more variable: quality_b <dbl>
Almost Balanced.
final_data %>%
group_by(climate, species) %>%
summarise(GoodQualityRate = mean(quality_b), .groups = "drop") %>%
ggplot(aes(x = species, y = GoodQualityRate, fill = species)) +
geom_col() +
facet_wrap(~climate) +
theme_minimal() +
labs(title = "Good Quality Proportion by Species and Climate", y = "Proportion")

final_data %>%
group_by(climate, species) %>%
summarise(GoodQualityRate = mean(quality_b), .groups = "drop") %>%
ggplot(aes(x = climate, y = species, fill = GoodQualityRate)) +
geom_tile(color = "white") +
geom_text(aes(label = scales::percent(GoodQualityRate, accuracy = 1)), color = "black", size = 4) +
scale_fill_viridis_c(option = "C", begin = 0.2, end = 0.9) +
theme_minimal(base_size = 13) +
labs(
title = "Heatmap of Good Quality Proportion",
x = "Climate",
y = "Species",
fill = "Proportion\nGood Quality"
)

library(dplyr)
library(tidyr)
library(ggplot2)
# Aggregate good and total pears per tree
tree_yield <- final_data %>%
group_by(tree_id, climate, species) %>%
summarise(
total_pears = n(),
good_quality_pears = sum(quality_index >= 60),
.groups = "drop"
) %>%
pivot_longer(cols = c(total_pears, good_quality_pears),
names_to = "Count_Type",
values_to = "Number_of_Pears")
ggplot(tree_yield, aes(x = climate, y = Number_of_Pears, fill = Count_Type)) +
geom_boxplot(position = position_dodge(width = 0.75), width = 0.6) +
facet_wrap(~species, nrow = 1, scales = "free_x") +
scale_fill_manual(values = c("good_quality_pears" = "#4575b4", "total_pears" = "#e0ecf4"),
labels = c("Good Quality Pears", "Total Pears")) +
labs(
title = "Distribution of Total vs. Good-Quality Pears per Tree",
subtitle = "By Climate Scenario and Species",
x = "Climate Scenario",
y = "Number of Pears",
fill = "Count Type"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major.x = element_blank())

ggsave("pear_yield_plot.png", width = 12, height = 6, dpi = 300)
# Filter for only good quality pears
tree_yield_good <- tree_yield %>%
filter(Count_Type == "good_quality_pears")
# Plot only good-quality pears
ggplot(tree_yield_good, aes(x = climate, y = Number_of_Pears, fill = Count_Type)) +
geom_boxplot(position = position_dodge(width = 0.75), width = 0.6) +
facet_wrap(~species, nrow = 1, scales = "free_x") +
scale_fill_manual(values = c("good_quality_pears" = "#4575b4"),
labels = c("Good Quality Pears")) +
labs(
title = "Distribution of Good-Quality Pears per Tree",
subtitle = "By Climate Scenario and Species",
x = "Climate Scenario",
y = "Number of Good-Quality Pears",
fill = "Count Type"
) +
theme_minimal(base_size = 16) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank())

# Save
ggsave("good_quality_pears_plot.png", width = 12, height = 6, dpi = 300)
ggplot(tree_yield_good, aes(x = climate, y = Number_of_Pears, fill = species)) +
geom_violin(alpha = 0.4, position = position_dodge(width = 0.9), trim = FALSE) +
geom_boxplot(width = 0.2, position = position_dodge(width = 0.9), outlier.shape = NA) +
geom_jitter(position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.9),
alpha = 0.3, size = 1) +
facet_wrap(~species, scales = "free_x") +
labs(title = "Good-Quality Pear Yield Distribution by Climate and Species",
x = "Climate Scenario", y = "Good-Quality Pears per Tree") +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Question 2
(b) Analyze the outcome ‘size’ of the pears over time, taking
into account the covariates.
# Pivot to long format
data_long <- size_data %>%
pivot_longer(
cols = starts_with("week_"),
names_to = "week",
names_prefix = "week_",
values_to = "size"
) %>%
mutate(week = as.integer(week))
head(data_long)
## # A tibble: 6 × 7
## pear_id tree_id ecotr_id species climate week size
## <dbl> <dbl> <dbl> <chr> <chr> <int> <dbl>
## 1 1 1 1 Conference Scenario 1 0 0
## 2 1 1 1 Conference Scenario 1 1 0.6
## 3 1 1 1 Conference Scenario 1 2 0.8
## 4 1 1 1 Conference Scenario 1 3 1.2
## 5 1 1 1 Conference Scenario 1 4 2.3
## 6 1 1 1 Conference Scenario 1 5 3.1
# Average growth across all pears
ggplot(data_long, aes(x = week, y = size)) +
stat_summary(fun = mean, geom = "line", color = "blue", size = 1) +
stat_summary(fun.data = mean_se, geom = "ribbon", alpha = 0.2) +
labs(title = "Average Pear Size Over Weeks", y = "Size", x = "Week")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
* data seems close to linear.
ggplot(data_long, aes(x = week, y = size, group = pear_id)) +
geom_line(alpha = 0.3) +
labs(title = "Individual Growth Trajectories of Pear size", y = "Size", x = "Week")

ggplot(data_long, aes(x = week, y = size, group = pear_id, color = species)) +
geom_line(alpha = 0.6) +
facet_wrap(~ climate) +
labs(
title = "Pear Growth Over Time by Scenario and Species",
y = "Size",
x = "Week",
color = "Species"
) +
theme_minimal()

- Even though there’s overlap, the upper bounds of Conference growth
curves exceed those of Doyenne in almost every panel.
- The red lines (Conference) generally rise more steeply and reach
higher size values than the blue lines (Doyenne), especially evident
from weeks 10 onward.
- this show it is also likely a species effect, not just a
climate-specific one.
head(data_long)
## # A tibble: 6 × 7
## pear_id tree_id ecotr_id species climate week size
## <dbl> <dbl> <dbl> <chr> <chr> <int> <dbl>
## 1 1 1 1 Conference Scenario 1 0 0
## 2 1 1 1 Conference Scenario 1 1 0.6
## 3 1 1 1 Conference Scenario 1 2 0.8
## 4 1 1 1 Conference Scenario 1 3 1.2
## 5 1 1 1 Conference Scenario 1 4 2.3
## 6 1 1 1 Conference Scenario 1 5 3.1
# Plot
ggplot(data_long, aes(x = week, y = size, group = pear_id, color = climate)) +
geom_line(alpha = 0.2) + # individual pear growth
stat_smooth(aes(group = climate), method = "lm", se = FALSE, size = 1.2) + # smooth lines
facet_wrap(~ species) + # split by species
labs(title = "Pear Size Over Time by Climate Scenario and Species",
x = "Week", y = "Size") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
* Conference shows slightly faster growth across all scenarios as
compared to Doyenne Specie * Apparently, pears size in scenario one is
larger from 15 weeks onward across both species.
# Compute average size per week per species and climate
avg_growth <- data_long %>%
group_by(week, species, climate) %>%
summarise(mean_size = mean(size, na.rm = TRUE), .groups = "drop")
# Plot
ggplot(data_long, aes(x = week, y = size, group = pear_id, color = species)) +
geom_line(alpha = 0.3) + # individual lines
geom_line(data = avg_growth, aes(x = week, y = mean_size, group = species),
size = 0.5, linetype = "solid") + # average lines
facet_wrap(~ climate) +
labs(
title = "Pear Growth Over Time by Scenario and Species",
y = "Size",
x = "Week",
color = "Species"
) +
theme_minimal()
Observation:
- Intercept starts from zero ( all sizes equal to zero at week 0)
- Much variability within pears (Variability within growth rate of
individual pears)
- Fixed number of measurements per pears without dropout.
- measurements taken at fixed time points
- So balanced design
Suggested Model
As the intercept starts with 0 so we don’t need random intercept
There is a within pears variability So we’ll include random slops in the
model because we are interested in change of pear size over time (growth
rate), which is change in slope.
- Fixed effect Climate Scenarios & Species
- Random Effect Pear Id
- Our main concern it to check pear size (growth rate) over time.
- The growth rate might varies across climate scenarios so
**climate*week** (The effect of time (week) on pear size depends on
climate averaged across species)
- Also growth rate might varies across species **species*week**. (The
effect of time (week) on pear size depends on species averaged across
climate scenarios)
- climatespeciesweek : This tests whether
the pear size (growth rate) differs between species depends on climate
scenarios. The species difference in growth rate (for example Species 1
grows faster than Species 2) might be stronger in some climates than
others.
# let's export the data and prepare statistical model in sas instead of R
write.csv(data_long, "pear_growth_data.csv", row.names = FALSE)
head(data_long)
## # A tibble: 6 × 7
## pear_id tree_id ecotr_id species climate week size
## <dbl> <dbl> <dbl> <chr> <chr> <int> <dbl>
## 1 1 1 1 Conference Scenario 1 0 0
## 2 1 1 1 Conference Scenario 1 1 0.6
## 3 1 1 1 Conference Scenario 1 2 0.8
## 4 1 1 1 Conference Scenario 1 3 1.2
## 5 1 1 1 Conference Scenario 1 4 2.3
## 6 1 1 1 Conference Scenario 1 5 3.1
Model results and interpretation:
Pear Growth Chart
- First let’s interpret non significant effect so that we can stick
with the model without it.
- The interaction between weekclimatespecies
is highly non significant and effect size is very small, which shows the
pear size between species doesn’t depends on climate change.
- let’s fit the model after removing it and then interpret it

Fixed Effects Interpretation
- All climate scenarios show significant growth (p < 0.0001).
Scenario one with the fastest growth as compared to scenario 2 and
3.
- Conference pears outperform Doyenne consistently across all
climates
Random Effects
- Minimal variability between individual pears’ growth rates.
Variance = 0.00204
- Much Residual variance: High unexplained variability, suggesting
other factors ( soil quality, water) may influence size.
Type 3 Tests of Fixed Effects
- Conference grows significantly faster than Doyenne (F=481.78,
p<.0001).
- Overall effect: Growth rates differ significantly across climates
(F=9.24, p<.0001).